ClInt: a Bilingual Spanish-Catalan Spoken Corpus of Clinical Interviews
نویسندگان
چکیده
In this paper we present ClInt (Clinical Interview), a bilingual Spanish-Catalan spoken corpus that contains 15 hours of clinical interviews. It consists of audio files aligned with multiple-level transcriptions comprising orthographic, phonetic and morphological information, as well as linguistic and extralinguistic encoding. This is a previously non-existent resource for these languages and it offers a wide-ranging exploitation potential in a broad variety of disciplines such as Linguistics, Natural Language Processing and related fields.
منابع مشابه
Monolingual and Bilingual Spanish-Catalan Speech Recognizers Developed from SpeechDat Databases
Under the SpeechDat specifications, the Spanish member of SpeechDat consortium has recorded a Catalan database that includes one thousand speakers. This communication describes some experimental work that has been carried out using both the Spanish and the Catalan speech material. A speech recognition system has been trained for the Spanish language using a selection of the phonetically balance...
متن کاملVowel categorization during word recognition in bilingual toddlers.
Toddlers' and preschoolers' knowledge of the phonological forms of words was tested in Spanish-learning, Catalan-learning, and bilingual children. These populations are of particular interest because of differences in the Spanish and Catalan vowel systems: Catalan has two vowels in a phonetic region where Spanish has only one. The proximity of the Spanish vowel to the Catalan ones might pose sp...
متن کاملTowards the Use of Word Stems and Suffixes for Statistical Machine Translation
In this paper we present methods for improving the quality of translation from an inflected language into English by making use of part-of-speech tags and word stems and suffixes in the source language. Results for translations from Spanish and Catalan into English are presented on the LC-STAR trilingual corpus which consists of spontaneously spoken dialogues in the domain of travelling and app...
متن کاملA Large Spanish-Catalan Parallel Corpus Release for Machine Translation
We present a large Spanish-Catalan parallel corpus extracted from ten years of the paper edition of a bilingual Catalan newspaper. The produced corpus of 7.5 M parallel sentences (around 180 M words per language) is useful for many natural language applications. We report excellent results when building a statistical machine translation system trained on this parallel corpus. The Spanish-Catala...
متن کاملBilingual aligned corpora for speech to speech translation for Spanish, English and Catalan
In the framework of the EU-funded Project LC-STAR, a set of Language Resources (LR) for all the Speech to Speech Translation components (Speech recognition, Machine Translation and Speech Synthesis) was developed. This paper deals with the development of bilingual corpora in Spanish, US English and Catalan. The corpora were obtained from spontaneous dialogues in one of these three languages whi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Procesamiento del Lenguaje Natural
دوره 45 شماره
صفحات -
تاریخ انتشار 2010